This Project: "This web page originated as an assignment in Emory University's Biology 142 lab course. Students were assigned proteins of interest and asked to research what is known about the protein and to examine whether the newly sequenced whale shark genome had evidence of an orthologous protein."
Background:
CTSL2, also referred to as Cathepsin L2, is a member of the peptidase C1 family. It’s most abundantly located in the thymus and testis. CTSL2 is a lysosome cysteine proteinase that plays a key role in corneal physiology. The CTSL2 gene is detected in colorectal and breast carcinomas but not in normal colon, mammary gland, or peritumoral tissues, which suggests that this gene may play an important function in tumor processes.
Figure 1: Role of Cysteine Cathepsins in joint inflammation. Methods/Approach: Whale shark predicted orthologs
The FASTA sequence for the human protein (ENSP00000259470) was obtained by using the website ensembl.org and the human protein sequence was used as query in a Blast against the predicted whale shark protein database using the http://whaleshark.georgiaaquarium.org/Galaxy server. Looking for the lowest E-values and the alignment lengths, significant hits in the whale shark database from the BLAST search were found. The top 5 predicted protein hits from the full predicted sequences obtained from the whale shark database were then used as querries in protein BLASTs (blastp tab) against the NCBI human protein database to see if there are homologues. For each predicted protein, the top hit name and E-value that best aligned with the query were then reported. Predicted orthologs
CTSL2 predicted orthologs were identified in non-whaleshark species using the NCBI Blast server (http://blast.ncbi.nlm.nih.gov/Blast.cgi) in order to find similar sequences in other species. This was done by blasting the top 5 predicted protein hits from the full predicted sequences against the NCBI non-whaleshark species' protein database to see if there are homologues. Protein BLASTs were performed using single species protein databases for Mouse, Zebrafish, Rat, and African Clawed Frog protein databases.The human CTSL2 protein (ENSP00000410076) was used as query sequence and the best BLASTp best hits were then selected and reported. Phylogenetic tree
Using this online site (http://www.genome.jp/tools/clustalw/), the sequences were aligned by entering all the sequences in FASTA format. In order to examine how the whale shark sequence is similar to or different from the sequences of human or non-whaleshark species, the hit with the lowest E-value for each non-whale shark species search along with the top 5 predicted protein hits from the full predicted sequences obtained from the whale shark database were used to align multiple sequences to create a phylogenetic tree.
Searching for CTSL2 in the whale shark:
The human CTSL2 protein sequence was used as query in a Blast against the predicted whale shark protein database and results are shown in the table below. There were 5 hits with E-values below 2e-5. These 5 best hits (obtained by looking for the lowest E-values) were then blasted against the NCBI human protein database to see if there were homologues.
ID
Length
Top Hit Name
Top Hit E-value
g41441.t1
182
leucine-rich repeat neuronal protein 2 precurso
7.00E-130
g32914.t1
85
cathepsin L2 preproprotein
4.00E-18
g45165.t1
86
cathepsin S isoform 2 preproprotein
8.00E-19
g42056.t1
77
cathepsin B preproprotein
2.00E-81
g16082.t1
54
cathepsin L2 preproprotein
3.6
The table above shows the human CTSL2 protein sequence best BLASTp hits against the whale shark predicted protein database. Using the Galaxy server, the human CTSL2 protein sequence was used as query in a Blast against the predicted whale shark protein database. Looking for the lowest E-values and the alignment lengths, significant hits in the whale shark database from the BLAST search were found. The top 5 predicted protein hits (according to E-values) from the full predicted sequences are listed in the table above with the predicted protein database ID and amino acid length. These sequences were also used as querries in protein BLASTs (blastp tab) against the NCBI human protein database to see if there are homologues. The top hit (according to lowest E-value) name and E-value for each predicted protein are reported above.
Two of the predicted proteins from the Blast of the whale shark proteins using the human CTSL2 sequence as query returned CTSL2 as the best hit against the human protein database. However, we are not confident that g16082.t1 is an CTSL2 orthologs because the E-value of 3.6 is not significant at all. The top hit g32914 has a low e-value when we did reciprocal blast (4.00E-18) and also returned CTSL2 so therefore we can say that this whale shark ID is a CTSL2 ortholog. The top hit, g41441.t1, had a low e-value (3E-24) and high alignment length (187) so it's believed that the whale shark has a homologue of this protein. When we did a reciprocal blast for that whale shark ID, we had a top hit E-value of 7.00E-130 as well as a 39% identity and a query of 58%,which means that it's very unlikely that this was due to chance and that there's an ortholog between humans and whale sharks. Therefore, it's likely that the protein in the whale shark is related to the protein in humans, but it might have changed earlier in the phylogeny.
Protein Domain:
Figure 2. This gene belongs to the Family C1 peptidase and it contributes to proteolytic activity to the digestive vacuoles and to the lysosomal system of eukaryotic cells. Orthologs The human CTSL2 protein sequence (ENSP00000259470) was entered in NCBI BLAST searches against several individual species protein databases as shown in Table 2. The most prevalent result was for the cathepsin L1 or cathepsin L2 protein, though arabidopsis (with an extremely high e-value) yielded the cysteine protease SAG12 protein. This demonstrates the extremely early beginnings of the CTSL2 in protein evolution, as its prevalence is noted in most species it was tested against.
Table 2. Best hits with human CTSL2 protein BLAST. Includes the Species, name, ID, Length, and E-value. Highlighted entries represent the closest hits from the BLAST results.
Phylogeny
The best hits from the protein database searches (including the whale shark database) using human TLR3 protein were used to create a phylogenetic tree.The whale shark proteins tested were similar to one another and to arabidopsis, while the other species were related to one another and contained either the cathepsin L1 or cathepsin L2 gene.
Figure 3. Phylogenetic tree of CTSL2 best hits, contains the best hits from the BLAST searches including the best 2 hits from the whale shark database. The ClustalW2 algorithm was used to analyze the data. The “frog” result denotes the African horned frog.
Conclusions: Based on the results gathered, the gene coding for the CTSL2 protein appears to be identifiable in the genome of the whale sharks. There was also enough evidence to indicate that this domain is present as an ortholog in other species, including humans. Some of the species where the ortholog was found include mouse, zebrafish, rat and frog, similar to the other protein sequences that were part of this project. The presence of CTSL2 domain in all these subjects propose no divergence from the last common ancestor genome. The role of this protein in humans is not completely understood. As described previously, the protein is most abundantly found in the thymus and testis, playing a role as a lysosome cysteine proteinase. The key role of this gene is that it is often detected in colorectal and breast carcinomas, but not in normal tissue, suggesting it is important for tumor processes. Being able to find this gene coding for CTSL2 could open up doors in terms of finally getting a better grasp on the function of this protein in carcinomas in smaller, model organisms and then finally in larger organisms such as humans and whale sharks. This, of course, could aid in the treatment of these tumors if present in the organism. Of course, the main purpose of the project, that being to map the genome of the whale sharks was also successfully accomplished.
Chen, N., Seiberg, M., & Lin, C. B. (2006). Cathepsin L2 Levels Inversely Correlate with Skin Color. J Invest Dermatol, 126(10), 2345-2347.
"CTSV cathepsin V [Homo sapiens (human)]." National Center for Biotechnology Information. Web. 13 Apr. 2015.
"Effects of Methionine Supplementation on the Expression of Protein Deposition-Related Genes in Acute Heat Stress-Exposed Broilers." National Center for Biotechnology Information. Web. 13 Apr 2015.
Spira, D., Stypmann, J., Tobin, D. J., Petermann, I., Mayer, C., Hagemann, S., . . . Reinheckel, T. (2007). Cell Type-specific Functions of the Lysosomal Protease Cathepsin L in the Heart. Journal of Biological Chemistry, 282(51), 37045-37052.
Del Vesco, A. P., Gasparino, E., Grieser, D. O., Zancanela, V., Voltolini, D. M., Khatlab, A. S., . . . Neto, A. R. O. (2015). Effects of Methionine Supplementation on the Expression of Protein Deposition-Related Genes in Acute Heat Stress-Exposed Broilers. PLoS ONE, 10(2), e0115821.
Paik, S., Tang, G., Shak, S., Kim, C., Baker, J., Kim, W., . . . Wolmark, N. (2006). Gene Expression and Benefit of Chemotherapy in Women With Node-Negative, Estrogen Receptor–Positive Breast Cancer. Journal of Clinical Oncology, 24(23), 3726-3734.
"This web page originated as an assignment in Emory University's Biology 142 lab course. Students were assigned proteins of interest and asked to research what is known about the protein and to examine whether the newly sequenced whale shark genome had evidence of an orthologous protein."
Background:
CTSL2, also referred to as Cathepsin L2, is a member of the peptidase C1 family. It’s most abundantly located in the thymus and testis. CTSL2 is a lysosome cysteine proteinase that plays a key role in corneal physiology. The CTSL2 gene is detected in colorectal and breast carcinomas but not in normal colon, mammary gland, or peritumoral tissues, which suggests that this gene may play an important function in tumor processes.
Figure 1: Role of Cysteine Cathepsins in joint inflammation.
Methods/Approach:
Whale shark predicted orthologs
The FASTA sequence for the human protein (ENSP00000259470) was obtained by using the website ensembl.org and the human protein sequence was used as query in a Blast against the predicted whale shark protein database using the http://whaleshark.georgiaaquarium.org/ Galaxy server. Looking for the lowest E-values and the alignment lengths, significant hits in the whale shark database from the BLAST search were found. The top 5 predicted protein hits from the full predicted sequences obtained from the whale shark database were then used as querries in protein BLASTs (blastp tab) against the NCBI human protein database to see if there are homologues. For each predicted protein, the top hit name and E-value that best aligned with the query were then reported.
Predicted orthologs
CTSL2 predicted orthologs were identified in non-whaleshark species using the NCBI Blast server (http://blast.ncbi.nlm.nih.gov/Blast.cgi) in order to find similar sequences in other species. This was done by blasting the top 5 predicted protein hits from the full predicted sequences against the NCBI non-whaleshark species' protein database to see if there are homologues. Protein BLASTs were performed using single species protein databases for Mouse, Zebrafish, Rat, and African Clawed Frog protein databases.The human CTSL2 protein (ENSP00000410076) was used as query sequence and the best BLASTp best hits were then selected and reported.
Phylogenetic tree
Using this online site (http://www.genome.jp/tools/clustalw/), the sequences were aligned by entering all the sequences in FASTA format. In order to examine how the whale shark sequence is similar to or different from the sequences of human or non-whaleshark species, the hit with the lowest E-value for each non-whale shark species search along with the top 5 predicted protein hits from the full predicted sequences obtained from the whale shark database were used to align multiple sequences to create a phylogenetic tree.
Searching for CTSL2 in the whale shark:
The human CTSL2 protein sequence was used as query in a Blast against the predicted whale shark protein database and results are shown in the table below. There were 5 hits with E-values below 2e-5. These 5 best hits (obtained by looking for the lowest E-values) were then blasted against the NCBI human protein database to see if there were homologues.
Two of the predicted proteins from the Blast of the whale shark proteins using the human CTSL2 sequence as query returned CTSL2 as the best hit against the human protein database. However, we are not confident that g16082.t1 is an CTSL2 orthologs because the E-value of 3.6 is not significant at all. The top hit g32914 has a low e-value when we did reciprocal blast (4.00E-18) and also returned CTSL2 so therefore we can say that this whale shark ID is a CTSL2 ortholog. The top hit, g41441.t1, had a low e-value (3E-24) and high alignment length (187) so it's believed that the whale shark has a homologue of this protein. When we did a reciprocal blast for that whale shark ID, we had a top hit E-value of 7.00E-130 as well as a 39% identity and a query of 58%,which means that it's very unlikely that this was due to chance and that there's an ortholog between humans and whale sharks. Therefore, it's likely that the protein in the whale shark is related to the protein in humans, but it might have changed earlier in the phylogeny.
Protein Domain:
Figure 2. This gene belongs to the Family C1 peptidase and it contributes to proteolytic activity to the digestive vacuoles and to the lysosomal system of eukaryotic cells.
Orthologs
The human CTSL2 protein sequence (ENSP00000259470) was entered in NCBI BLAST searches against several individual species protein databases as shown in Table 2. The most prevalent result was for the cathepsin L1 or cathepsin L2 protein, though arabidopsis (with an extremely high e-value) yielded the cysteine protease SAG12 protein. This demonstrates the extremely early beginnings of the CTSL2 in protein evolution, as its prevalence is noted in most species it was tested against.
Table 2. Best hits with human CTSL2 protein BLAST. Includes the Species, name, ID, Length, and E-value. Highlighted entries represent the closest hits from the BLAST results.
Phylogeny
The best hits from the protein database searches (including the whale shark database) using human TLR3 protein were used to create a phylogenetic tree.The whale shark proteins tested were similar to one another and to arabidopsis, while the other species were related to one another and contained either the cathepsin L1 or cathepsin L2 gene.
Figure 3. Phylogenetic tree of CTSL2 best hits, contains the best hits from the BLAST searches including the best 2 hits from the whale shark database. The ClustalW2 algorithm was used to analyze the data. The “frog” result denotes the African horned frog.
Conclusions:
Based on the results gathered, the gene coding for the CTSL2 protein appears to be identifiable in the genome of the whale sharks. There was also enough evidence to indicate that this domain is present as an ortholog in other species, including humans. Some of the species where the ortholog was found include mouse, zebrafish, rat and frog, similar to the other protein sequences that were part of this project. The presence of CTSL2 domain in all these subjects propose no divergence from the last common ancestor genome. The role of this protein in humans is not completely understood. As described previously, the protein is most abundantly found in the thymus and testis, playing a role as a lysosome cysteine proteinase. The key role of this gene is that it is often detected in colorectal and breast carcinomas, but not in normal tissue, suggesting it is important for tumor processes. Being able to find this gene coding for CTSL2 could open up doors in terms of finally getting a better grasp on the function of this protein in carcinomas in smaller, model organisms and then finally in larger organisms such as humans and whale sharks. This, of course, could aid in the treatment of these tumors if present in the organism. Of course, the main purpose of the project, that being to map the genome of the whale sharks was also successfully accomplished.
References:
"603308- Cathepsin L2; CTSL2." Online Mendelian Inheritance in Man. Web. 13 Apr. 2015.
Chen, N., Seiberg, M., & Lin, C. B. (2006). Cathepsin L2 Levels Inversely Correlate with Skin Color. J Invest Dermatol, 126(10), 2345-2347.
"CTSV cathepsin V [Homo sapiens (human)]." National Center for Biotechnology Information. Web. 13 Apr. 2015.
"Effects of Methionine Supplementation on the Expression of Protein Deposition-Related Genes in Acute Heat Stress-Exposed Broilers." National Center for Biotechnology Information. Web. 13 Apr 2015.
Spira, D., Stypmann, J., Tobin, D. J., Petermann, I., Mayer, C., Hagemann, S., . . . Reinheckel, T. (2007). Cell Type-specific Functions of the Lysosomal Protease Cathepsin L in the Heart. Journal of Biological Chemistry, 282(51), 37045-37052.
Del Vesco, A. P., Gasparino, E., Grieser, D. O., Zancanela, V., Voltolini, D. M., Khatlab, A. S., . . . Neto, A. R. O. (2015). Effects of Methionine Supplementation on the Expression of Protein Deposition-Related Genes in Acute Heat Stress-Exposed Broilers. PLoS ONE, 10(2), e0115821.
Paik, S., Tang, G., Shak, S., Kim, C., Baker, J., Kim, W., . . . Wolmark, N. (2006). Gene Expression and Benefit of Chemotherapy in Women With Node-Negative, Estrogen Receptor–Positive Breast Cancer. Journal of Clinical Oncology, 24(23), 3726-3734.